首页> 外文OA文献 >Phantom-GRAPE: numerical software library to accelerate collisionless $N$-body simulation with SIMD instruction set on x86 architecture
【2h】

Phantom-GRAPE: numerical software library to accelerate collisionless $N$-body simulation with SIMD instruction set on x86 architecture

机译:phantom-GRapE:加速无碰撞的数值软件库   在x86架构上使用sImD指令集进行$ N $ -body仿真

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

(Abridged) We have developed a numerical software library for collisionlessN-body simulations named "Phantom-GRAPE" which highly accelerates forcecalculations among particles by use of a new SIMD instruction set extension tothe x86 architecture, AVX, an enhanced version of SSE. In our library, not onlythe Newton's forces, but also central forces with an arbitrary shape f(r),which has a finite cutoff radius r_cut (i.e. f(r)=0 at r>r_cut), can be quicklycomputed. Using an Intel Core i7--2600 processor, we measure the performance ofour library for both the forces. In the case of Newton's forces, we achieve 2 x10^9 interactions per second with 1 processor core, which is 20 times higherthan the performance of an implementation without any explicit use of SIMDinstructions, and 2 times than that with the SSE instructions. With 4 processorcores, we obtain the performance of 8 x 10^9 interactions per second. In thecase of the arbitrarily shaped forces, we can calculate 1 x 10^9 and 4 x 10^9interactions per second with 1 and 4 processor cores, respectively. Theperformance with 1 processor core is 6 times and 2 times higher than those ofthe implementations without any use of SIMD instructions and with the SSEinstructions. These performances depend weakly on the number of particles. Itis good contrast with the fact that the performance of force calculationsaccelerated by GPUs depends strongly on the number of particles. Substantiallyweak dependence of the performance on the number of particles is suitable tocollisionless N-body simulations, since these simulations are usually performedwith sophisticated N-body solvers such as Tree- and TreePM-methods combinedwith an individual timestep scheme. Collisionless N-body simulationsaccelerated with our library have significant advantage over those acceleratedby GPUs, especially on massively parallel environments.
机译:(摘要)我们已经开发了用于无碰撞N体模拟的数值软件库,名为“ Phantom-GRAPE”,它通过使用x86架构的新SIMD指令集扩展AVX(SSE的增强版本),极大地加速了粒子之间的力计算。在我们的库中,不仅可以快速计算牛顿力,而且可以计算具有有限截止半径r_cut(即,在r> r_cut时f(r)= 0)的任意形状f(r)的中心力。使用英特尔酷睿i7--2600处理器,我们可以评估两种作用力下我们库的性能。在牛顿力的情况下,我们使用1个处理器内核实现了每秒2 x10 ^ 9的交互,这比不使用任何SIMD指令的实现的性能高20倍,比使用SSE指令的性能高2倍。使用4个处理器核,我们每秒可获得8 x 10 ^ 9的交互性能。在任意形状的力的情况下,我们可以分别通过1个和4个处理器内核来计算每秒1 x 10 ^ 9和4 x 10 ^ 9的交互。 1个处理器内核的性能是不使用SIMD指令和使用SSE指令的实现的6倍和2倍。这些性能在一定程度上取决于颗粒数。与GPU加速力计算的性能很大程度上取决于粒子的数量这一事实形成了很好的对比。性能对粒子数量的基本弱依赖关系适用于无碰撞N体模拟,因为这些模拟通常是使用复杂的N体求解器(例如Tree-和TreePM方法)结合单独的时间步方案来执行的。与我们的库相比,通过我们的库加速的无碰撞N体仿真具有明显的优势,尤其是在大规模并行环境中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号